Learning Phenotype Mapping for Integrating Large Genetic Data
نویسندگان
چکیده
Accurate phenotype mapping will play an important role in facilitating Phenome-Wide Association Studies (PheWAS), and potentially in other phenomics based studies. The PheWAS approach investigates the association between genetic variation and an extensive range of phenotypes in a high-throughput manner to better understand the impact of genetic variations on multiple phenotypes. Herein we define the phenotype mapping problem posed by PheWAS analyses, discuss the challenges, and present a machine-learning solution. Our key ideas include the use of weighted Jaccard features and term augmentation by dictionary lookup. When compared to string similarity metric-based features, our approach improves the F-score from 0.59 to 0.73. With augmentation we show further improvement in F-score to 0.89. For terms not covered by the dictionary, we use transitive closure inference and reach an F-score of 0.91, close to a level sufficient for practical use. We also show that our model generalizes well to phenotypes not used in our training dataset.
منابع مشابه
Gene mapping in white spruce (P. glauca): QTL and association studies integrating population and expression data
Background Connecting phenotype with genotype is the basis for developing forest genetic applications such as marker assisted selection (MAS). Quantitative Trait Locus (QTL) mapping and genetic association mapping (or linkage disequilibrium (LD) are two major approaches to find genes that control phenotypes of interest in forest trees. Quantitative trait loci (QTL) and association mapping exper...
متن کاملExploring Label Dependency in Active Learning for Phenotype Mapping
Many genetic epidemiological studies of human diseases have multiple variables related to any given phenotype, resulting from different definitions and multiple measurements or subsets of data. Manually mapping and harmonizing these phenotypes is a timeconsuming process that may still miss the most appropriate variables. Previously, a supervised learning algorithm was proposed for this problem....
متن کاملEfficient Algorithms in Analyzing Genomic Data
Feng Pan: Efficient Algorithms in Analyzing Genomic Data. (Under the direction of Wei Wang.) With the development of high-throughput and low-cost genotyping technologies, immense data can be cheaply and efficiently produced for various genetic studies. A typical dataset may contain hundreds of samples with millions of genotypes/haplotypes. In order to prevent data analysis from becoming a bottl...
متن کاملStructural mapping: how to study the genetic architecture of a phenotypic trait through its formation mechanism
Traditional approaches for genetic mapping are to simply associate the genotypes of a quantitative trait locus (QTL) with the phenotypic variation of a complex trait. A more mechanistic strategy has emerged to dissect the trait phenotype into its structural components and map specific QTLs that control the mechanistic and structural formation of a complex trait. We describe and assess such a st...
متن کاملGenetic Variations in Exon 3 of VWF Gene in Patients with Von Willebrand Disease (VWD) from South-West Iran
Abstract Background Von Willebrand disease (VWD) is an autosomally inherited bleeding disorder with the prevalence of 1% based on population studies. The disease phenotype is due to quantitative and structural/functional defects in Von Willebrand Factor (VWF) which is a glycoprotein with essential role as a carrier of FVIII in circulation and also it serves the function as hemostasis regulato...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011